Policy iteration algorithm based on experience replay to solve H∞ control problem of partially unknown nonlinear systems
نویسندگان
چکیده
— In this paper, an online adaptive optimal control algorithm based on policy iteration (PI) is developed to solve the H∞ control problem of partially unknown nonlinear continuous-time (CT) systems. The convergence of existing PI algorithms for solving the H∞ control is guaranteed under the restrictive persistency of excitation (PE) condition. By using the idea of experience replay this condition is relaxed here to a simplified rank condition which is easy to verify online. This is achieved by using previously stored data concurrently with current data for updating the critic NN weights. The proposed algorithm is implemented on actor-critic-disturbance neural network (NN) structure, where all NNs are tuned at the same time to obtain the solution of the Hamilton-Jacobi-Isaacs (HJI) equation, without requiring the information on the internal system dynamics. The stability of the closed-loop system is guaranteed and the convergence to the optimal solution is obtained. Simulation results show the effectiveness of the proposed method.
منابع مشابه
Adaptive Optimal Control of Partially-unknown Constrained-input Systems using Policy Iteration with Experience Replay
This paper develops an online learning algorithm to find optimal control solutions for partially-unknown continuous-time systems subject to input constraints. The input constraints are encoded into the optimal control problem through a nonquadratic performance functional. An online policy iteration algorithm that uses integral reinforcement knowledge is developed to learn the solution to the op...
متن کاملUsing Modified IPSO-SQP Algorithm to Solve Nonlinear Time Optimal Bang-Bang Control Problem
In this paper, an intelligent-gradient based algorithm is proposed to solve time optimal bang-bang control problem. The proposed algorithm is a combination of an intelligent algorithm called improved particle swarm optimization algorithm (IPSO) in the first stage of optimization process together with a gradient-based algorithm called successive quadratic programming method (SQP) in the second s...
متن کاملOptimal adaptive leader-follower consensus of linear multi-agent systems: Known and unknown dynamics
In this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. The error dynamics of each player depends on its neighbors’ information. Detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. The introduced reinforcement learning-based algorithms learn online the approximate solution...
متن کاملAdaptive Approximation-Based Control for Uncertain Nonlinear Systems With Unknown Dead-Zone Using Minimal Learning Parameter Algorithm
This paper proposes an adaptive approximation-based controller for uncertain strict-feedback nonlinear systems with unknown dead-zone nonlinearity. Dead-zone constraint is represented as a combination of a linear system with a disturbance-like term. This work invokes neural networks (NNs) as a linear-in-parameter approximator to model uncertain nonlinear functions that appear in virtual and act...
متن کاملControl Stability Evaluation of Multiple Distribution Static Compensators based on Optimal Coefficients using Salp Swarm Algorithm
In order to solve the problem of voltage drop and voltage imbalance in the distribution systems, the injection of reactive power by multiple static compensators is used. The distributed generation such as photovoltaic systems could play a role of the static compensators by producing reactive power. In this paper, the integral to droop line algorithm is used to control the reactive power in busb...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014